In order to study the ways in which our country’s weather has behaved in the past years, we decided to look for a data set on different sources from the Internet. Unfortunately, we have not found any data set containing the metrics we need, therefore we decided to use the Visual Crossing Weather API, which would give us the weather conditions in Romania in a set of locations and time periods we chose to specify.
The period of time has been set to a daily basis from Jan 1st 2011, until Dec 31st 2021. This choice has been made due to financial limitations we encountered on the API’s part. In case that we had wanted to procure more data, more requests would have been necessary to be made to the public API, exceeding the threshold of free requests.
For each request, the API will return a single .csv file, respecting the name convention weather_YEAR_COUNTY.csv. So, for 41 Romanian counties and 11 years, we would obtain 451 csv files. However, we need to work with a single data set, so we decided to merge all of them into a single one in the code block below.
library(tidyverse)
library(ggplot2)
library(plotly)
# TRUE only at first when "weather_2011-2021_Romania.csv" is not present in our project
MERGE_ALL_DATASETS = FALSE
if (MERGE_ALL_DATASETS) {
filenames_list = list.files(
path = "../weather_data/",
pattern = "*.csv",
full.names = TRUE
)
data = lapply(filenames_list, read_csv) %>% bind_rows()
write.csv(data, "weather_2011-2021_Romania.csv", row.names = FALSE, na = "")
}
data = read_csv("weather_2011-2021_Romania.csv", show_col_types = FALSE)
rm(MERGE_ALL_DATASETS)TODO: Describe the meaning of each relevant column using this documentation
The data set we obtained is mostly in a good shape. However, on a quick scan, we identified a few aspects that required some attention before diving into the exploratory analysis (e.g. splitting complex columns into multiple simpler ones).
In order to add value to the data set, we also decided to do some feature engineering, and create additional columns that would enhance the diversity of the data types. This diversity will allow for creating new types of plots and open multiple questions with possibly more surprising insights in the Romanian weather.
We start by removing blank spaces from column names for easier writing of the R code (e.g. “Minimum Temperature” becomes “MinimumTemperature”) and redundant columns ResolvedAddress and Name.
names(data) = gsub(' ', '', names(data))
data = data %>% select(-c("ResolvedAddress", "Name"))The data set contains two columns named Weather Type and Conditions that contain strings of different weather conditions identified at a specific time and location, each condition being separated by a comma. However, there can be multiple conditions reported in a day, so we decided that, instead of having a single column describing all of the conditions, we could create a separate column for each weather type and condition.
As a result, all of these new columns will be of boolean type, describing whether the weather has been or not identified in the given condition. There are 37 possible weather types and 4 possible conditions in the data set, therefore 41 new columns will be created.
accumulate_weather_func = function(accumulated_types, weather) {
if (!is.na(weather)) {
conditions = strsplit(weather, ", ")[[1]]
accumulated_types = union(accumulated_types, conditions)
}
return(accumulated_types)
}
# Compute the list of possible weather types in this data set
weather_types = reduce(data$WeatherType, accumulate_weather_func, .init = c())
cat("Number of possible weather types:", length(weather_types))## Number of possible weather types: 37
# Create the weather type columns
for (weather_type in weather_types) {
column_name = gsub(' ', '', weather_type) # remove spaces
column_name = gsub('/', 'Or', column_name) # replace slashes with Or
data[[column_name]] = with(
data,
# If Weather is NA, then new column will contain NA as well
if_else(
is.na(WeatherType),
NA,
# If Weather is defined, check that condition is present => set TRUE/FALSE
if_else(
grepl(weather_type, WeatherType, fixed = TRUE),
TRUE,
FALSE
)
)
)
}
rm(column_name, weather_type)
# Display an example of some weather columns
data %>%
slice(42:46) %>%
select(c(WeatherType, Fog, LightRain, SnowShowers, Duststorm))## # A tibble: 5 x 5
## WeatherType Fog LightRain SnowShowers Duststorm
## <chr> <lgl> <lgl> <lgl> <lgl>
## 1 Mist, Fog, Light Rain TRUE TRUE FALSE FALSE
## 2 Snow Showers, Mist, Rain, Rain Showers,~ FALSE TRUE TRUE FALSE
## 3 Mist FALSE FALSE FALSE FALSE
## 4 <NA> NA NA NA NA
## 5 Mist FALSE FALSE FALSE FALSE
# Compute the list of possible conditions in this data set
conditions = reduce(data$Conditions, accumulate_weather_func, .init = c())
cat("Number of possible conditions:", length(conditions))## Number of possible conditions: 4
# Create the conditions columns
for (condition in conditions) {
column_name = paste0("Condition", condition)
if (condition == "Partially cloudy") {
column_name = "ConditionPartiallyCloudy"
}
data[[column_name]] = with(
data,
# If Condition is NA, then new column will contain NA as well
if_else(
is.na(Conditions),
NA,
# If Condition is defined, check that condition is present => set TRUE/FALSE
if_else(
grepl(condition, Conditions, fixed = TRUE),
TRUE,
FALSE
)
)
)
}
rm(column_name, condition)
# Display an example of the conditions columns
data %>%
slice(11:15) %>%
select(c(Conditions, ConditionClear, ConditionPartiallyCloudy, ConditionOvercast, ConditionRain))## # A tibble: 5 x 5
## Conditions ConditionClear ConditionPartia~ ConditionOverca~ ConditionRain
## <chr> <lgl> <lgl> <lgl> <lgl>
## 1 Clear TRUE FALSE FALSE FALSE
## 2 Partially clou~ FALSE TRUE FALSE FALSE
## 3 Overcast FALSE FALSE TRUE FALSE
## 4 Overcast FALSE FALSE TRUE FALSE
## 5 Rain, Overcast FALSE FALSE TRUE TRUE
As the data is currently offered, the wind direction is expressed as a real number, whose value ranges between 0 and 360 degrees. We can express this value under a categorical form, by assigning a named direction for each specific division on the trigonometric circle. In this way, we can obtain 16 different categories for 16 subdivisions, as can be checked in the figure below.
wind_direction_table = data.frame(
"Cardinal_Point" = c("N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW", "SW", "WSW", "W", "WNW", "NW", "NNW"),
"From" = c(348.75, seq(11.25, 326.25, 22.5)),
"To" = seq(11.25, 348.75, 22.5)
)Table for Wind Cardinal Directions and Degrees
Using the above table, we can take each value from the WindDirection column and build the new column WindCardinalDirection.
# Break down the numeric direction into more categories
data = data %>%
mutate(
WindCardinalDirection = cut(
x = WindDirection,
breaks = c(0, wind_direction_table$To, 360),
labels = c(wind_direction_table$Cardinal_Point, "N")
)
)
# Display an example of some wind directions
data %>%
slice_sample(n=5) %>%
select(c(WindDirection, WindCardinalDirection))## # A tibble: 5 x 2
## WindDirection WindCardinalDirection
## <dbl> <fct>
## 1 230. SW
## 2 304. NW
## 3 240. WSW
## 4 174. S
## 5 277. W
The data set presents a column called Visiblity that represents the distance, measured in kilometers, at which an object or light can be clearly discerned.
As mentioned by the National Weather Service at this location, we can break down these distances into 4 sets, creating a new column called VisibilityLevel, which describes an ordered categorical variable.
The 4 divisions are described as being the following:
data = data %>%
mutate(
VisibilityLevel = cut(
x = Visibility,
breaks = c(0, 0.926, 3.704, 9.26, +Inf),
labels = c("Very Poor", "Poor", "Moderate", "Good"),
include.lowest = TRUE
)
)
# Display an example of some visibility levels
data %>%
slice_sample(n=5) %>%
select(c(Visibility, VisibilityLevel))## # A tibble: 5 x 2
## Visibility VisibilityLevel
## <dbl> <fct>
## 1 16.5 Good
## 2 13.1 Good
## 3 10.2 Good
## 4 10.4 Good
## 5 13.3 Good
The column Address contains the name of each county seat in our country, followed by , Romania, but we would like to have a column called County that contains only the name of the county (not the county seat) to use while plotting the Romania’s map. It is important that the county names computed here to be the same as the ones employed in the geoJson file containing data about drawing the counties’ borders (e.g. must rename Bucharest to Bucuresti).
In addition to this, we also create a new column Mnemonic which can be used in plotting interactive maps.
romania_counties = c("Alba","Arad","Arges","Bacau","Bihor","Bistrita-Nasaud","Botosani","Brasov","Braila","Bucuresti","Buzau","Caras-Severin","Calarasi","Cluj","Constanta","Covasna","Dambovita","Dolj","Galati","Giurgiu","Gorj","Harghita","Hunedoara","Ialomita","Iasi","Maramures","Mehedinti","Mures","Neamt","Olt","Prahova","Satu Mare","Salaj","Sibiu","Suceava","Teleorman","Timis","Tulcea","Vaslui","Valcea","Vrancea")
romania_county_seats = c("Alba Iulia","Arad","Pitesti","Bacau","Oradea","Bistrita","Botosani","Brasov","Braila","Bucuresti","Buzau","Resita","Calarasi","Cluj-Napoca","Constanta","Sfantu Gheorghe","Targoviste","Craiova","Galati","Giurgiu","Targu Jiu","Miercurea Ciuc","Deva","Slobozia","Iasi","Baia Mare","Drobeta-Turnu Severin","Targu Mures","Piatra Neamt","Slatina","Ploiesti","Satu Mare","Zalau","Sibiu","Suceava","Alexandria","Timisoara","Tulcea","Vaslui","Ramnicu Valcea","Focsani")
romania_mnemonics = c("AB","AR","AG","BC","BH","BN","BT","BV","BR","B","BZ","CS","CL","CJ","CT","CV","DB","DJ","GL","GR","GJ","HR","HD","IL","IS","MM","MH","MS","NT","OT","PH","SM","SJ","SB","SV","TR","TM","TL","VS","VL","VN")
# First create a column CountySeat, translate the special Bucharest case,
# and then, based on the above lists, create the final column County
data = data %>%
mutate(CountySeat = gsub(', Romania', '', Address)) %>%
mutate(CountySeat = ifelse(CountySeat == "Bihor", "Oradea", CountySeat)) %>%
mutate(
CountySeat = ifelse(
CountySeat == "Bucharest",
"Bucuresti",
CountySeat
)
) %>%
mutate(
County = plyr::mapvalues(
x = CountySeat,
from = romania_county_seats,
to = romania_counties
),
Mnemonic = plyr::mapvalues(
x = CountySeat,
from = romania_county_seats,
to = romania_mnemonics
)
)
# Display an example of several counties
data %>%
slice_sample(n=5) %>%
select(c(Address, CountySeat, County, Mnemonic))## # A tibble: 5 x 4
## Address CountySeat County Mnemonic
## <chr> <chr> <chr> <chr>
## 1 Botosani, Romania Botosani Botosani BT
## 2 Zalau, Romania Zalau Salaj SJ
## 3 Drobeta-Turnu Severin, Romania Drobeta-Turnu Severin Mehedinti MH
## 4 Timisoara, Romania Timisoara Timis TM
## 5 Resita, Romania Resita Caras-Severin CS
We were not sure what the API will return when calling it to get Ilfov’s weather (might have returned Bucharest’s weather, which is inside it), so we decided to pass this county’s API calls in order to allow for other free calls for other counties.
In this project we will suppose that Ilfov and Bucharest share the same weather.
ilfov_data = data %>%
filter(County == "Bucuresti") %>%
mutate(
Address = "Ilfov, Romania",
County = "Ilfov",
Mnemonic = "IF",
CountySeat = "Bucuresti",
Latitude = 44.5355,
Longitude = 26.2324
)
data = rbind(data, ilfov_data)
rm(ilfov_data)
romania_counties = append(romania_counties, "Ilfov")
romania_county_seats = append(romania_county_seats, "Bucuresti")
romania_mnemonics = append(romania_mnemonics, "IF")For some of our plots we detected a need to compare different seasons of the year, so we decided to create a new column specifically for this scope.
compute_season_from_date = function(date_col) {
month_col = substr(date_col, 1, 2)
seasons_col = map(month_col, function(month) {
if (month %in% c("03", "04", "05")) {
return("Spring")
}
if (month %in% c("06", "07", "08")) {
return("Summer")
}
if (month %in% c("09", "10", "11")) {
return("Autumn")
}
return("Winter")
})
return(as.character(seasons_col))
}
data = data %>%
mutate(Season = compute_season_from_date(Datetime))…as well as for the month…
compute_month_from_date = function(date_col) {
month_col = substr(date_col, 1, 2)
mapping = list(
"01" = "January",
"02" = "February",
"03" = "March",
"04" = "April",
"05" = "May",
"06" = "June",
"07" = "July",
"08" = "August",
"09" = "September",
"10" = "October",
"11" = "November",
"12" = "December"
)
return(as.character(mapping[month_col]))
}
months_names = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
data = data %>%
mutate(Month = compute_month_from_date(Datetime)) %>%
mutate(Month = factor(Month, levels = months_names))First we define the data set portion that we are going to use in this analysis.
# Select relevant columns for this analysis.
data_wt_cond = data %>%
select(
"Season", "County", "Mnemonic", "Latitude", "Longitude",
"Mist":"HeavyFreezingRain",
"ConditionPartiallyCloudy":"ConditionRain"
)
wt_cols = 6:42
cond_cols = 43:46There is a wide variety of weather types that can be identified in Romania during the years’ four seasons. We have identified 37 special ones, as labeled by the Visual Crossing Weather API, but what is the most common unusual weather in Romania? We could first take a look at the exact number of occurrences for each weather type.
# Create data frame for the first plot
data_wt_cond_1 = data_wt_cond %>%
summarise(across(all_of(wt_cols), sum, na.rm = TRUE)) %>%
gather(key = "WeatherType", value = "Occurrences") %>%
arrange(Occurrences) %>%
mutate(WeatherType = factor(WeatherType, WeatherType))
total_occurrences = (data_wt_cond_1 %>% summarise(sum(Occurrences)))[[1]]
data_wt_cond_1 = data_wt_cond_1 %>%
mutate(OccurrencesFreq = round(Occurrences / total_occurrences * 100, 3))
# Define text label function
get_occ_text = function(type_col, occ_col, occ_freq_col) {
texts = list()
for (i in 1:length(occ_col)) {
text = paste0(
"Weather: ", type_col[i], "\n",
occ_freq_col[i], "% of Unusual Weather\n",
occ_col[i], " observed occurrences"
)
texts = append(texts, text)
}
return(texts)
}
plot_wt_cond_1 = data_wt_cond_1 %>%
ggplot(aes(
x = WeatherType,
y = OccurrencesFreq,
text = get_occ_text(WeatherType, Occurrences, OccurrencesFreq)
)) +
geom_segment(
aes(x = WeatherType, xend = WeatherType, y = 0, yend = OccurrencesFreq),
color = "#82a3e0",
alpha = 1
) +
geom_point(color="#82a3e0", size=2, alpha=1) +
coord_flip() +
labs(
title = "Frequency of Unusual Weather Conditions in Romania",
x = "",
y = "% of unusual weather occurrences"
)
rm(data_wt_cond_1, total_occurrences)
ggplotly(plot_wt_cond_1, tooltip = "text")rm(plot_wt_cond_1)The lollipop plot from above tells us that the Mist weather has the biggest amount of occurrences during 2011-2021 when summing up all the Romania’s counties. This might be explained by the counties found in the Carpathian Mountains where this type of weather is usually more manifested, together with Rain, Fog, and Snow.
In order to confirm this hypothesis, we decided to plot the most unusual weather for each county and check the results in the following map.
library(geojsonio)
library(broom)
romania_spdf = geojson_read("romania.geojson", what = "sp")
romania_map = tidy(romania_spdf, region = "name")
data_wt_cond_2 = data_wt_cond %>%
select(c(County, all_of(wt_cols))) %>%
group_by(County) %>%
summarise(across(1:37, sum, na.rm = TRUE)) %>%
gather(key = "WeatherType", value = "Occurrences", -County) %>%
group_by(County) %>%
filter(Occurrences == max(Occurrences)) %>%
mutate(WeatherType = factor(WeatherType, WeatherType)) %>%
select(-Occurrences) %>%
ungroup()
map_wt_cond_2 = romania_map %>%
left_join(. , data_wt_cond_2, by = c("id" = "County"))
ggplot() +
geom_polygon(
data = map_wt_cond_2,
aes(fill = WeatherType, x = long, y = lat, group = group),
color="#dbdbdb"
) +
coord_map() +
theme_void() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_manual(
values = c("#b1b3c7", "#848bc4", "#888a99", "#343873"),
labels = c("Mist", "Sky Coverage Increasing", "Fog", "Rain")
) +
labs(
title = "Most Unusual Weather Conditions in Romania",
fill = "Weather Type"
)rm(romania_map, romania_spdf, data_wt_cond_2, map_wt_cond_2)As it is the case here, it seems that the Mist weather is encountered mostly in the southern and eastern parts of the country, while the Oriental Carpathians present a Fog weather, with mostly Rainy days in Mures and Sibiu, inside the Transylvanian Plateau. Note that Mist is also a type of Fog, but one that contains less condensed water drops.
Here we look into the 4 conditions the data set has to offer related to the Conditions column. In the pie chart below you can see the overall distribution of the weather conditions in the entire period among all the Romania’s counties, and it looks like the most part of the weather is Partially Cloudy.
data_wt_cond_3 = data_wt_cond %>%
summarise(across(all_of(cond_cols), sum, na.rm = TRUE)) %>%
gather(key = "Condition", value = "Occurrences") %>%
arrange(desc(Occurrences)) %>%
mutate(Condition = factor(Condition, Condition)) %>%
mutate(ypos = cumsum(Occurrences) - 0.5 * Occurrences )
total_occurrences = (data_wt_cond_3 %>% summarise(sum(Occurrences)))[[1]]
data_wt_cond_3 = data_wt_cond_3 %>%
mutate(OccurrencesFreq = round(Occurrences / total_occurrences * 100, 2))
data_wt_cond_3 %>%
ggplot(aes(x = "", y = Occurrences, fill = Condition)) +
geom_bar(stat = "identity", width = 1, color="white") +
coord_polar("y", start = 0) +
theme_void() +
theme(plot.title = element_text(hjust = 0.5)) +
labs(title = "Distribution of registered Sky Conditions") +
scale_fill_manual(
values = c("#0c9fc7", "#041b8f", "#02d7e3", "#363e66"),
labels = c("Partially Cloudy", "Rain", "Clear", "Overcast")
) +
geom_text(
aes(y = total_occurrences - ypos, label = paste0(OccurrencesFreq, "%")),
color = "white",
size = 6
)rm(data_wt_cond_3, total_occurrences)The next part would be to plot these conditions on the Romania’s map, however, we decided to take a step further, and plot their distributions among the four different seasons. Down below you can observe the most observed weather condition for each county during spring, summer, autumn, and winter time.
library(leaflet)
romania_spdf = geojson_read("romania.geojson", what = "sp")
data_wt_cond_4 = data_wt_cond %>%
select(c(Season, County, Mnemonic, all_of(cond_cols))) %>%
group_by(Season, County, Mnemonic) %>%
summarise(across(1:4, sum, na.rm = TRUE)) %>%
gather(key = "SkyCondition", value = "Occurrences", -c(County, Season, Mnemonic)) %>%
group_by(Season, County, Mnemonic) %>%
filter(Occurrences == max(Occurrences)) %>%
mutate(SkyCondition = factor(SkyCondition, SkyCondition)) %>%
ungroup()
# Surprisingly, during Winter, Timis has both Rain and Partialy Cloudy as the most
# encountered conditions (both with 140 occurences), so we must pick one to display.
# We chose Partially Cloudy, since it looks like this is the trend in Romania
data_wt_cond_4 = data_wt_cond_4 %>%
filter(County != "Timis" | Season != "Winter" | SkyCondition != "ConditionRain")
convert_cond_col_to_text = function(cond_col) {
from_arr = c("ConditionClear", "ConditionPartiallyCloudy", "ConditionOvercast", "ConditionRain")
to_arr = c("Clear", "Partially Cloudy", "Overcast", "Rain")
return(plyr::mapvalues(
x = cond_col,
from = from_arr,
to = to_arr
))
}
plot_conditions_distribution_for_season = function(season_name, season_palette) {
data_wt_cond_season_4 = data_wt_cond_4 %>%
filter(Season == season_name) %>%
select(County, SkyCondition)
map_wt_season_4 = romania_spdf
map_wt_season_4@data = map_wt_season_4@data %>%
left_join(data_wt_cond_season_4, by = c("name" = "County"))
cond_palette = colorFactor(
palette = season_palette,
levels = c("ConditionClear", "ConditionPartiallyCloudy", "ConditionOvercast", "ConditionRain"),
domain = map_wt_season_4@data$SkyCondition,
na.color = "transparent"
)
season_labels <- sprintf(
"<strong>%s</strong><br/>Condition: <em>%s</em>",
map_wt_season_4$name,
convert_cond_col_to_text(map_wt_season_4$SkyCondition)
) %>% lapply(htmltools::HTML)
plot = leaflet(
map_wt_season_4,
options = leafletOptions(zoomSnap = 0.5, zoomDelta = 0.5)
) %>%
setView(25, 46, 6.5) %>%
addProviderTiles(providers$OpenStreetMap.HOT) %>%
addPolygons(
fillColor = ~cond_palette(SkyCondition),
opacity = 1,
color = "#dbdbdb",
weight = 2,
fillOpacity = 0.7,
highlightOptions = highlightOptions(
weight = 4,
color = "#6b6b6b",
fillOpacity = 0.7,
bringToFront = TRUE
),
label = season_labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"
)
) %>%
addLegend(
pal = cond_palette,
values = ~SkyCondition,
opacity = 0.7,
title = paste(season_name, "Conditions"),
position = "topright",
labFormat = labelFormat(transform = convert_cond_col_to_text)
)
return(plot)
}
spring_palette = c("#79f520", "#6aba2d", "#357a18", "#2c6314")
summer_palette = c("#f5f120", "#bab72d", "#7a7818", "#636214")
autumn_palette = c("#f56b20", "#ba5e2d", "#7a3918", "#632c14")
winter_palette = c("#20d9f5", "#2daaba", "#18737a", "#0e3b40")
plot_conditions_distribution_for_season("Spring", spring_palette)plot_conditions_distribution_for_season("Summer", summer_palette)plot_conditions_distribution_for_season("Autumn", autumn_palette)plot_conditions_distribution_for_season("Winter", winter_palette)rm(spring_palette, summer_palette, autumn_palette, winter_palette)
rm(data_wt_cond_4, data_wt_cond, wt_cols, cond_cols)We can conclude this section with a few observations:
In this section we are going to look at the first thing that comes to one’s mind when talking about weather: Temperature.
We will first look at the distribution of the minimum, maximum and average daily temperature across all Romania’s counties during the entire period 2011-2021. This result is achieved by averaging the temperatures for each day across all the 42 counties, and then displaying them into the violin plot from below.
data_temp_1 = data %>%
select(Datetime, MinimumTemperature, MaximumTemperature, Temperature) %>%
group_by(Datetime) %>%
summarise(
Minimum = mean(MinimumTemperature, na.rm = TRUE),
Maximum = mean(MaximumTemperature, na.rm = TRUE),
Average = mean(Temperature, na.rm = TRUE)
) %>%
select(-Datetime) %>%
gather(key = "Temperature", value = "Value") %>%
mutate(Temperature = factor(Temperature, levels=c("Minimum", "Average", "Maximum")))
plot_temp_1 = ggplot(data_temp_1, aes(x=Temperature, y=Value, fill=Temperature)) +
geom_violin(alpha=0.8) +
labs(
title = "Temperature distribution in Romania during 2011-2021",
x = "Daily Temperatures",
y = ""
) +
scale_fill_manual(
name = "", # remove legend title
values = c("#1922d4", "#d4cb19", "#d41919"),
labels = c("Minimum", "Average", "Maximum")
) +
scale_y_continuous(
breaks = seq(-20, 40, 5),
limits = c(-20, 40),
labels = function (x) paste(x, "°C")
) +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(plot_temp_1, tooltip = "text")rm(plot_temp_1, data_temp_1)A few of the notable aspects to observe in this plot are the following:
A reason for the appearance of two modes in each distribution can be given by the extreme temperature differences between summer and winter seasons. If we analyzed the temperatures distribution for each season, we would find without any surprise significant differences, which is what we have done in the following plot.
data_temp_2 = data %>%
select(Season, Datetime, MinimumTemperature, MaximumTemperature, Temperature) %>%
group_by(Datetime, Season) %>%
summarise(
Minimum = mean(MinimumTemperature, na.rm = TRUE),
Maximum = mean(MaximumTemperature, na.rm = TRUE),
Average = mean(Temperature, na.rm = TRUE)
) %>%
ungroup() %>%
select(-Datetime) %>%
gather(key = "Temperature", value = "Value", 2:4) %>%
mutate(Temperature = factor(Temperature, levels=c("Minimum", "Average", "Maximum"))) %>%
mutate(Season = factor(Season, levels=c("Spring", "Summer", "Autumn", "Winter")))
plot_temp_2 = ggplot(data_temp_2, aes(x=Temperature, y=Value, fill=Temperature)) +
geom_violin(alpha=0.8) +
facet_wrap(~Season) +
labs(
title = "Temperature distribution in Romania by seasons",
x = "Daily Temperatures",
y = ""
) +
scale_fill_manual(
name = "", # remove legend title
values = c("#1922d4", "#d4cb19", "#d41919"),
labels = c("Minimum", "Average", "Maximum")
) +
scale_y_continuous(
breaks = seq(-20, 40, 5),
limits = c(-20, 40),
labels = function (x) paste(x, "°C")
) +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(plot_temp_2, tooltip = "text")Looking at other time frames, we decided to look at the month level, and compare the daily average temperatures for each month in Romania over the last 11 years. This way we can conclude which would usually be the hottest/coldest months in Romania.
library(ggridges)
rm(plot_temp_2, data_temp_2)
data_temp_3 = data %>%
mutate(Month = factor(Month, levels = rev(months_names)))
ggplot(data_temp_3, aes(x = Temperature, y = Month, fill = ..x..)) +
geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01) +
scale_fill_viridis_c(option = "C") +
theme(
legend.position="none",
plot.title = element_text(hjust = 0.5)
) +
scale_x_continuous(
breaks = seq(-20, 32.5, 5),
limits = c(-20, 32.5),
labels = function (x) paste(x, "°C")
) +
labs(
title = "Temperature distribution in Romania by months (2011-2021)",
x = "Daily Average Temperatures",
y = ""
)rm(data_temp_3)Judging by the ridgeline plot from above, we can tell that August is usually the hottest month of the year, with a mean of around 23 °C, while January can represent the coldest month of the year with a mean of around 0 °C, having slightly more lower temperatures than in December.
Keeping in mind that we can investigate temperatures at a month level, we tried to go back to Minimums, Maximums and Averages, and compute their means for the entire country and depict them in the grouped barplot below.
data_temp_4 = data %>%
group_by(Month) %>%
summarise(
Minimum = mean(MinimumTemperature, na.rm = TRUE),
Maximum = mean(MaximumTemperature, na.rm = TRUE),
Average = mean(Temperature, na.rm = TRUE)
) %>%
gather(key = "Temperature", value = "Value", Minimum, Maximum, Average) %>%
mutate(Temperature = factor(Temperature, levels = c("Maximum", "Average", "Minimum"))) %>%
mutate(Month = factor(Month, rev(months_names)))
plot_temp_4 = ggplot(
data_temp_4,
aes(
x = Month,
y = Value,
fill = Temperature,
text = paste("Mean of", paste0(Temperature, "s:"), round(Value, 2), "°C")
)
) +
geom_bar(position = "dodge", stat = "identity", alpha = 0.7) +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(
breaks = seq(-5, 30, 5),
limits = c(-5, 30),
labels = function (x) paste(x, "°C")
) +
scale_fill_manual(
name = "", # remove legend title
values = c("#d41919", "#d4cb19", "#1922d4"),
labels = c("Minimum", "Average", "Maximum")
) +
labs(
title = "Mean Temperatures in Romania by months (2011-2021)",
y = "Daily Temperatures",
x = ""
)
ggplotly(plot_temp_4, tooltip = "text")rm(plot_temp_4, data_temp_4)small_data = data %>% group_by(Address) %>% filter(row_number() == 1)
ggplot(small_data, aes(x=Latitude, y=Longitude)) +
geom_point(color="orange",
fill="#69b3a2",
shape=21,
alpha=0.5,
size=6,
stroke = 2)+geom_text(
label=rownames(small_data),
nudge_x = 0.25, nudge_y = 0.25,
check_overlap = T
)